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Abstract 



In this paper, we analyze the decision version of the NK landscape model from the 
perspective of threshold phenomena and phase transitions under two random distributions, 
the uniform probability model and the fixed ratio model. For the uniform probability 
model, we prove that the phase transition is easy in the sense that there is a polynomial 
algorithm that can solve a random instance of the problem with the probability asymptotic 
to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several 
upper bounds for the solubility threshold, and prove that random instances with parameters 
above these upper bounds can be solved polynomially. This, together with our empirical 
study for random instances generated below and in the phase transition region, suggests 
that the phase transition of the fixed ratio model is also easy. 

1. Introduction 

The NK landscape is a fitness landscape model devised by Kauffman (1989). An appealing 
property of the NK landscape is that the "ruggedness" of the landscape can be tuned 
by changing some parameters. Over the years, the NK landscape model itself has been 
studied from the perspectives of statistics and computational complexity (Weinberger, 1996; 
Wright, Thompson, &: Zhang, 2000). In the study of genetic algorithms, NK landscape 
models have been used as a prototype and benchmark in the analysis of the performance of 
different genetic operators and the effects of different encoding methods on the algorithm's 
performance (Altenberg, 1997; Hordijk, 1997; Jones, 1995). 

In the field of combinatorial search and optimization, one of the interesting discoveries 
is the threshold phenomena and phase transitions. Roughly speaking, a phase transition in 
combinatorial search refers to the phenomenon that the probability that a random instance 
of the problem has a solution drops abruptly from 1 to as the order parameter of the 
random model crosses a critical value called the threshold. Closely related to this phase 
transition in solubility is the hardness of solving the problems. There has been strong empir- 
ical evidence and theoretical arguments showing that the hardest instances of the problems 
usually occur around the threshold and instances generated with parameters far away from 
the threshold are relatively easy. Since the seminal work of Cheeseman et al. (Cheese- 
man, Kanefsky, & Taylor, 1991), many NP-complete combinatorial search problems have 
been shown to have the phase transition and the associated easy-hard-easy pattern (Cook 
& Mitchell, 1997; Culberson &: Gent, 2001; Freeman, 1996; Gent, Maclntyre, Prosser, &: 
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Walsh, 1998; Kirkpatrick h Selman, 1994; Mitchell, Selman, h Levesque, 1992; Vandegriend 
& Culberson, 1998). 

In this paper, we analyze the NK landscape model from the perspective of threshold 
phenomena and phase transitions. We establish two random models for the decision problem 
of NK landscapes and study the threshold phenomena and the associated hardness of the 
phase transitions in these two models. 

The rest of the paper is organized as follows. In Section 2, we introduce the NK fitness 
landscape and our probabilistic models, the uniform probability model and the fixed ratio 
model. In Section 3 and Section 4, the threshold phenomena and phase transitions in 
NK landscapes are analyzed. For the uniform probability model, we prove that the phase 
transition of the uniform probability model is easy in the sense that there is a polynomial 
algorithm that can solve a random instance of the problem with the probability asymptotic 
to 1 as the problem size tends to infinity. For the fixed ratio model, we establish two upper 
bounds for the solubility threshold, and prove that random instances with parameters above 
these upper bounds can be solved polynomially. This, together with our empirical study 
for random instances generated below and in the phase transition region, suggests that 
the phase transition of the fixed ratio model is also easy. In Section 5, we report our 
experimental results on typical hardness of the fixed ratio model. In Section 6, we conclude 
our investigation and discuss implications of our results. 

2. NK Landscapes and their Probabilistic Models 

n 

An NK landscape f(x) = ^ fi(xi, IT(xi)), is a real- valued function defined on binary strings 

of fixed length, where n > is a positive integer and x = (x\, • • • , x n ) £ {0, 1}™. It is the sum 
of n local fitness functions fi, 1 < i < n. Each local fitness function fi(xi, Tl(xi)) depends 
on the main variable Xi and its neighborhood H(xi) C Vk{{xi, • • • , x n }\{xi}) where Vk(X) 
denotes the set of all subsets of size k from X. The most important parameters of an NK 
landscape are the number of variables n, and the size of the neighborhood k = \H(xi)\. 

In an NK landscape, the neighborhood Tl(xi) can be chosen in two ways: the random 
neighborhood, where the k variables are randomly chosen from the set {x\, ■ ■ ■ ,x n }\{xi}, 
and the adjacent neighborhood, where k variables with indices nearest to i (modulo n) are 
chosen. For example, for any even integer k, the k variables in Tl{xi) can be defined as 
^((n+i-i) mod „)>■■■ ^((n+i+i) mod n ) ■ 0nce the variables in the neighborhood are deter- 
mined, the local fitness function fi is determined by a fitness lookup table which specifies 
the function value fi for each of the 2 k+l possible assignments to the variables X{ and U(x,i). 

Throughout this paper, we consider NK landscapes with random neighborhoods. To 
simplify the discussion, we further assume that the local fitness functions take on binary 
values. Given an NK landscape /, the corresponding decision problem is stated as follows: 
Is the maximum of f(x) equal to nl An NK landscape decision problem is insoluble if there 
is no solution for it. 

It has been proved that the NK landscape model is NP complete for k > 2 (e.g., 
Weinberger, 1996; Wright et al., 2000). The proofs were based on a reduction from SAT to 
the decision problem of NK landscapes. To study the typical hardness of the NK landscape 
decision problems in the framework of thresholds and phase transitions, we introduce two 
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random models. In both of the models defined below, the neighborhood set Tl(xi) of a 
variable xi is selected by randomly choosing without replacement k = \H(xi)\ variables 
from x\{xi}. 

Definition 2.1. The Uniform Probability Model N(n,k,p): In this model, the fitness value 
of the local fitness function fi(xi,H(xi)) is determined as follows: For each assignment 
y £ Dom(fi) = {0, l} fc+1 , let fi(y) = with the probability p and fi(y) = 1 with the 
probability 1 — p, where this is done for each possible assignment and each local fitness 
function independently. 

Definition 2.2. The Fixed Ratio Model N(n,k,z): In this model, the parameter z takes 
on values from [0,2 fc+1 ]. If z is an integer, we specify the local fitness function fi(xi,H(xi)) 
by randomly choosing without replacement z tuples of possible assignments Y = (yi, • • • , y z ) 
from Dom(fi) = {0, l} fc+1 , and defining the local fitness function as follows: 

/.<»>={!: ir ¥ '' 

For a non-integer z = (1 — a) [z] + a[z + 1] where [z] is the integer part of z, we choose 
randomly without replacement [(1 — a)n] local fitness functions and determine their fitness 
values according to N(n,k, [z]). The rest of the local fitness functions are determined ac- 
cording to N(n, k, [z] + 1). 

In the theory of random graphs, there are two related random models G(n,p) where 
each of the "C" -1 ) possible edges is included in the graph independently with probability 
p, and G(n, m) where exactly m edges are chosen randomly and without replacement from 
the set of n ^ n 2 ^ possible edges. It is well known that for most of the monotone graph 
properties, results proved in G(n,p) (or G(n,m)) also hold asymptotically for G(n,Np) 
(correspondingly, G(n,^)) where N = "^" 2 ^ . However, we cannot expect that similar 
relations exist between the two random models of NK landscapes defined above unless the 
parameter k tends to infinity. As a result, the asymptotic behaviors of the two NK landscape 
models are significantly different for fixed k. 

We conclude this section by establishing a relation between the decision problem of NK 
landscapes and the SAT problem. A decision problem of the NK landscape 

n 

f( x ) = X]^^' 11 ^))' 

2=1 

"is the maximum of f(x) equal or greater than n?" , can be reduced to a (k+l)-SAT problem 
as follows: 

z 

(1) For each local fitness function fi(xi,H(xi)), construct a conjunction Cj = f\ C\ 

J i 

of clauses with exactly k + 1 variable-distinct literals from the set of variables {xi, H(xi)}, 
where z is the number of zero values that fi takes and C\ is such that for any assignment 
yj € {0, l} k+1 that falsifies Cj, we have fi{yj) = 0. 

n 

(2) The (k+l)-SAT is the conjunction ip = f\ C*. 

2=1 
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Table 1: A local fitness function and its equivalent 3-clauses. 



Table 1 shows an example of the fitness assignment of a local fitness function fi = fi(x, y, z) 
and its associated equivalent 3-SAT clauses. It is easy to see that for any assignment s to 
the variables x,y,z, fi(s) = 1 if and only if the assignment satisfies the formula 

a; V j/ V z, x\/ y V z, iV y V z, x\l y\l z. 
3. Analysis of The Uniform Probability Model 

In the uniform probability model N(n,k,p), the parameter p determines how many zero 
values a local fitness function can take. We are interested in how the solubility and hardness 
of the NK landscape decision problem change as the parameter p increases from to 1. It 
turns out that for fixed p > 0, the decision problem is asymptotically trivially insoluble. 
This is quite similar to the phenomena in the random models of the constraint satisfaction 
problem observed by Achlioptas et al. (1997). 

To gain some more insight into the problem, we consider the case where p = p(n) is a 

function of the problem size n with limp(n) = 0. Our analysis shows that the solubility of 

n 

the problem depends on how fast p(n) decreases: 

(1) If 

i 

Yimp(n)n2 k+1 = +oo, (3.1) 

n 

the problem is still asymptotically trivially insoluble because with the probability asymp- 
totic to 1, there is at least one local fitness function that always has a fitness value 0; 

(2) On the other hand if p(n) decreases fast enough, i.e., 

limp(n)n^T T < +oo, (3.2) 

n 

the problem can be decomposed into a set of independent sub-problems. In either case the 
problem can be solved in polynomial time. The case of (3.1) is not difficult to prove, but 
to prove the case of (3.2), we need to make use of the following concepts and results. 

n 

Definition 3.1. The connection graph of an NK landscape instance f(x) = ^ fi(xi, H(xi)) 

i i 

is a graph G = G(V, E) satisfying 
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(1) Each vertex v (zV corresponds to a local fitness function; and 

(2) There is an edge between Vi,Vj if and only if the corresponding local fitness functions 
fi,fj share variables, i.e., the neighborhoods H(xi) andU(xj) of Xi and Xj have a non-empty 
intersection, and both of them have at least one zero value. 

n 

Definition 3.2. Let f(x) = Yl fi( x i^( x i)) be an NK landscape instance with the con- 

2=1 

nection graph G = G(V,E). Let G\,--- ,Gi be the connected components of G. Since the 
vertices of G correspond to local fitness functions, we can regard Gi as a set of local fitness 
functions. For each 1 < i < I, let XJ% C x = (x\, ■ ■ ■ ,x n ) be the set of variables that appear 
in the definition of the local fitness functions in Gi. 

It is easy to see that (U\, ■ ■ ■ Ui) excluding independent vertices forms a disjoint partition 
of (a subset of) the variables x = (x\, ■ ■ ■ ,x n ), and that the local fitness functions in Gi 
only depend on the variables in U. Furthermore, the NK decision problem is soluble if and 
only if for each 1 < i < I, there is an assignment £ {0, 1}!^ to the variables in Ui such 
that for each local fitness function g £ Gi, g(s) = 1. 

Theorem 3.1 summarizes the result on the uniform probability model. 

i 

Theorem 3.1. For any p(n) such that limp(n)n2*+ 1 exists, k fixed, there is a polynomial 

n 

time algorithm that successfully solves a random instance of N(n,k,p) with probability 
asymptotic to 1 as n tends to infinity. 

i i 
Proof: We consider two cases: \imp(n)n , 2 k + 1 = +oo and limp(n)n2* ; + 1 < +oo. 

n n 

1 

(1) The case of \im.p(n)n^ k+1 = +oo. 

n 

Let Ai be the event that fi(y) = for each possible assignment y £ {0, l} k+1 and let 

n 

A = (J Ai be the event that at least one of the A^s occurs. We have 

2 = 1 

n 

lim Pr {A} = 1 - \im Pr{(~] A^\ 

i=l 

= 1 - lim (1 -p{nf k+l ) n . 

n— ¥oo 
1 

It can be shown that if k is fixed and limp(n)n2* ; + 1 = +oo, then lim Pr{A} = 1. It follows 

n n— ¥oo 

that with probability asymptotic to one, there is at least one local fitness function which 
takes on values for any possible assignments. We can therefore show that in this case, the 
NK decision problem is insoluble by checking the local fitness functions one by one. And 
this only takes linear time. 

i 

(2) The case of lim p(n)n 2 k+1 < +oo. 

n 

Consider an algorithm that first finds the connected components Gj, 1 < i < I of the 
connection graph G of the NK model, and then uses brute force to find an assignment 
Si £ {0, ljl^l to the variables in U such that for each local fitness function g £ Gj, 
g(s) = 1. The time complexity of this algorithm is 0(n 2 + n * 2 M ( n,k ' p )) where M(n, k,p) = 
max(|f7j|, 1 < i < I) is the maximum size of the subsets (Ui, 1 < i < I) associated with the 
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connected components of the connection graph. To prove the theorem, we only need to show 

i 

that M(n, k,p) € O(logn). In the following, we will show that for \im.p(n)n^ k+1 < +00, 

n 

we have 

lim Pr{M{n, k,p) < 2 k + 2} = 1 

n— ¥00 

Consider the connection graph G = G(V, E) of the NK model. It is a random graph and 
there is an edge between two nodes if and only if the two corresponding local fitness functions 
share variables and both of the local fitness functions take at least one zero as their fitness 
value. However, under this definition the edge probabilities are not independent. If vx G E 
then we know that f x has at least one zero and so the probability that xw is in E is greater 
than if there were no other edge on x. 

To deal with this we resort to the following proof construction. Let C m = {v\, . . . , v m } 
be a subset of V of size m. Let n be an ordering (permutation) of v\ . . . v m . We say that 
C m is variable connected with respect to the ordering n, denoted as C(C m ,n), if for each 
i, 2 < i < m there is either 

1. a j < i such that f v (j) and share a variable; or 

2- a j, 1 < j < i such that the variable Xj is one of the k random variables in /j. 

Lemma // the induced subgraph G[C m ] is connected then there exists at least one ordering 
7r of v\ . . . v m such that C(C m , ir). 

As proof, consider the ordering of vertices of any depth first search of a connected 
subgraph. In this case, the connections are all by case 1. 

The expected number of permutations n for which C(C m , it) is 

E c = E[|{tt : C(C m ,ir)}\] = m!Pr{C(C m , tt)} 

We then observe that the expected number of connected induced graphs on m vertices is 
less than p™(£ l )E c , where po is the probability that fi takes at least one value zero. We 
show this value goes to zero in the limit if m > 2 k + 2. Finally, since if there is a connected 
subgraph on m vertices then there must be one for each i < m, it follows that the largest 
connected component has size at most 2 k + 1. 

For a randomly generated permutation n of C m , let Cj be the set of the first i vertices 
of the permutation. For i > 2 define Pi to be the probability that shares at least one 
variable with fn(j) for some j < i given that C(Cj_i, 7r/l, • • • , i — 1). Let Pi = 1. (A one 
vertex subgraph is always connected.) 

For i > 1 we have Pi = Pr{3j < i,f n ^ and share variables, given C(Cj_i,7r) or 
one of the k random variables in is in {x\ . . . x m } — {x-i}}. 

m 

Pr{C(C m ,7r)} = n^- 

Finally, for i > 1 we note that Cj_i has at most (i — l)k distinct other variables. If Cj_i 
is connected then the number of variables may be less than this. Thus, 

fn— k(i— 1)— m\ 
D . s 1 k ) 
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The combinatorial part reduces to 

(n — k(i — 1) — m) . . . (n — k(i — 1) — m — k + 1) 
(n - 1) ... (n - k) 



> 



n — ki — m + 1 
n - 1 



k 



So, Pr{C(C m ,7r)} is 




n — ki — m + 1 
n - 1 



&m + m — 2 x 
n — 1 



n m-l\ 

, m, A; fixed. 




m— 1 



Noting that G O (ji 2k + 1 ^ , we see that the expected number of connected subgraphs of 
size m is bounded by 

which goes to zero if m = 2 fc +2. It follows that M(n, k,p) is less than 2 k +2 with probability 
asymptotic to 1. This completes the proof. 

4. Analysis of The Fixed Ratio Model 

As has been discussed in the previous section, the uniform probability model N(n,k,p) of 
NK landscapes is asymptotically trivial. Part of the cause of this asymptotic triviality lies in 
the fact that if the parameter p does not decrease very quickly with n, then asymptotically 
there will be at least one local fitness function that takes the value for all the possible 
assignments, making the whole decision problem insoluble. In this section, we study the 
fixed ratio model N(n,k,z). In this model, we require that each local fitness function has 
fixed number of zero values so that the trivially insoluble situation in the uniform probability 
model is avoided. We note that the same idea has been used in the study of the flawless 
CSP (Gent et al., 1998). 

Recall that in the fixed ratio model, we choose the neighborhood structure for each local 
fitness in the same way as in the uniform probability model N(n,k,p). To determine the 
fitness value for a local fitness function /j, we randomly without replacement select exactly 
z tuples {si, • • • , s z } from {0, and let fi(sj) = for each 1 < j < z and fi(s) = 1 for 

every other s G {0, 

For the fixed ratio model, we are interested in how the probability of an instance of 
N(n,k,z) being soluble changes as the parameter z increases from to 2 k+l . It is easy to 

n 

see that the property "There exists an assignment x such that f(x) = ^ fi(xi, II (xi)) = n" 

i=l 
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is monotone in the parameter z — the number of tuples at which a local fitness function takes 
zero. Actually, we have the following Lemma on the property of the solubility probability 
of the fixed ratio model: 

Lemma 4.1. For the fixed ratio model, if z\ > Z2, then 



Based on the above Lemma and in parallel to the study of the threshold phenomena in 
other random combinatorial structures such as 3-Coloring of random graphs and random 
3-SAT, we suggest the following conjecture: 

Conjecture 4.1. There exists a threshold z c such that 



Conjectures like this are the starting point of the study of phase transition in many 
random combinatorial structures such as 3-coloring of random graphs and random SAT, 
but the existence of the thresholds is still an open question (Achlioptas, 1999; Cook &: 
Mitchell, 1997). However, bounding the thresholds has been an important topic in the 
study of phase transition (Achlioptas, 1999, 2001; Dubois, 2001; Franco k, Gelder, 1998; 
Franco &: Paul, 1983; Frieze &: Suen, 1996; Kirousis, P.Kranakis, D.Krizanc, & Y.Stamation, 
1994). In this section, we will establish two upper bounds on the threshold of the parameter 
z c , and theoretically prove that random instances generated with the parameter z above 
these upper bounds can be solved with probability asymptotic to 1 by polynomial (even 
linear) algorithms. 

Characterizing the sharpness of the thresholds is also of great interest in the study of 
the phase transition. After proving that every monotone graph property has a threshold 
behavior (Friedgut h Kalai, 1996), Friedgut (1999) established a necessary and sufficient 
condition for a monotone graph property to have sharp threshold, which has been used to 
prove the sharpness of the thresholds of 3-colorability and 3-SAT problems (Friedgut, 1999; 
Achlioptas, 1999). For the fixed ratio model discussed in this paper, we suspect that it will 
exhibit a coarse threshold behavior, and would like to leave a detailed investigation into 
this problem as a future research direction. 

4.1 The Upper Bound of z = 3.0 

The derivation of this upper bound is based on the concept of a conflicting pair of local 
fitness functions. We say that two local fitness functions fi and fj conflict with each other 



Pr{N(n,k,z\)is soluble) < Pr{N(n,k,Z2)is soluble). 



Furthermore, we have 





if 



1. fi and fj share at least one variable x; and 
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2. For any assignment s € {0, l} n , we have fi(s)fj(s) = 0. 

It is obvious that an instance of the NK decision problem is insoluble if there exists a pair 
of conflicting local fitness functions. 

Based on the second moment method in the theory of probability (Alon Sz Spencer, 
1992), we can prove the following upper bound result. As it takes linear time to check if 
there is a pair of conflicting local fitness functions, we can see that the fixed ratio model 
N(n, 2, z) is linearly solvable when z > 3.0. 

Theorem 4.1. Define A to be the event that there is a conflicting pair of local fitness 
functions in N(n, 2, z). For the fixed ratio model N(n, 2, z) with z = 3.0 + e, we have 

lim Pr{A} = 1 

n 

and thus the problem is insoluble with probability asymptotic to 1. 

Proof: Without loss of generality, we may write / as where fi has 4 zeroes in its fitness 
value assignment for 1 < i < en, and 3 zeroes for en + 1 < i < n. Let Iij be the indicator 
function of the event that fi and fj conflicts with each other, i.e., 



_ J 1, if fi and fj conflicts with each other; 
\ 0. else. 



and 5 = E hi- We claim that lim Pr{S = 0} = 0. 

l<i,j<en 

By Chebyschev's inequality, we have 



Pr{S = 0} < Pr{\S - f7(5)| > E{S)} 

Var(S) (4.3) 



- (E(syy 

Since for each 1 < i < en, fi has exactly 4 zeros in its fitness value assignment, we know 
that two local fitness function 1 < i,j < sn, conflict with each other if and only if 

they have exactly one common variable x such that one of the following is true: (l)fi(s) = 
0(or l),/j(,s) = l(or 0) for all the assignments s such that x = 1 (respectively x = 0); and 
(2)fi(s) = l(or 0), fj(s) = 0(or 1) for all the assignments s such that x = 1 (respectively x = 
0); 

Since the probability that two local fitness functions share at least one variable is equal 

to 

we have 

^■ = l}=fl-SpJ)-2 WBW , 

V ( 2 )( 2 )J VUJ/ (4.4) 

= fi(-), e > 0, 1 < i,j < en, 
n 
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and hence, 

E(S)= E ^= E Pr{Iij = l}£Sl{n). 

l<i,j<sn l<i,j<en 

We now consider the variance of S. Since S = E Aj' we have 

l<i,j<en 

E Varihj) + 2 E [£{7^} - £{^}£{/,. m }] 
V«r(S) = 

Let 



E ^ar(^) 
1= W)) 2 



and 

2 E [E{IijI lm } - E{Iij}E{I lm }\ 
It is easy to see that lim A\ = 0. To prove lim = 0, we consider two cases: 

n— too n— too 

Case 1: i ^ j ^ m ^ I. In this case, the two random variables Iij and l[ m are actually 
independent. It follows that E{lijli m } — E{Iij}E{Ii m } = 0. 

Case 2: / (l,m), but they have one in common, say j = I. In this case, we have 

E{IijI lm } - E{Iij)E{I lm } = Pr{Iij = l}Pr{I jm = 1^ = 1} - Q (j^j j 

= n^Pr{/ im = i|/« = i}-n^iy 

Given that /j and fj conflict with each other, the conditional probability that fj and f m 
conflict with each other is still in fi(-). 

Since there are only C\ n pairs of 1^ and Ij m satisfying the condition in Case 2, we know 
that E [E{lijlim\ - E{Iij}E{Ii m }] is in O(n). And therefore, lim A 2 = 0. It follows 

i- \ n— too 

that 

lim Pr{S = 0} < lim ^/ffi. = 0. 

Since the event {5 > 0} implies that there exists a conflicting pair of local fitness functions, 
the theorem follows. □ 



2" 



4.2 2-SAT Sub-problems in N(n,2,z) and a Tighter Upper Bound 

In this subsection, we establish a tighter upper bound z > 2.837 for the threshold of the fixed 
ratio model N(n, 2, z) by showing that asymptotically N(n, 2, z) contains an unsatisfiable 
2-SAT sub-problem with probability 1 for any value of z greater than 2.873. This also 
gives us a polynomial time algorithm which determines that N(n, 2, z) is insoluble with 
probability asymptotic to 1 for z > 2.837. 
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Recall from Section 2 that each instance of N(n, 2, z) has an equivalent 3-SAT instance. 
The idea is to show that with probability asymptotic to 1, an instance of N(n,2,z) will 
contain a set of specially structured 3-clauses, called a t-3-module (Definition 10.3, Franco 
& Gelder, 1998): 



M = {Mi, ... , M 3j3+2 }, t = 3p + 2, 



where 



M 



l = {ui V U2 V zi, u\ V U2 V z\)\ 

Mp-i = (up-i V Up V V Up V %_i); 

Mp = (m p V«o V 2 P , lip V Mo V 
M p+ i = (itp+i V m p+2 V V m p+2 V z p+ i); 

M 3p -i = {U3 P -1 V M 3p V 23p-l,«3p-l V U 3p V ^3p-l); 

= (^3p Vu V £ 3j3 , m 3j9 Vu V %,) 

M 3p+ i = (m V Mi V z 3p+ i,u V Mi V %,+i); 

^3p+2 = (uo V M p+ i V £ 3})+2 , M V M p+ i V z 3p+2 ); 

and Mi, • • • , M 3p _|_i, zi, • • • , £ 3j3 +i are binary variables. Notice that a t-3-module can be re- 
duced to a 2-SAT problem containing two contradictory cycles and hence is unsatisfiable. 

The result is proved in two steps. In the first step, it is shown that for z > 2.837 the 
average number of t-3-modules contained in N(n, 2, z) tends to infinity as n increases. In 
the second step, we use a result established by Alon and Spencer (1992) on the second 
moment method to prove that for z > 2.837 the probability that N(n, 2, z) contains at least 
one t-3-module tends to 1. 

Let us start with the first step to show that the average number of t-3-modules contained 
in N(n, 2, z) tends to infinity as n increases. 

n 

Definition 4.1. Given a t-3-module M. and an NK landscape instance f = Yl fi^k = 2, a 

i=l 

sequence of local fitness functions 

g = idi, ■ ■ ■ ,9t) C (/1, • • • ,/„) 

is said to be a possible match(PM) if for each 1 < m < t, the main variable of g m is one of 
the three variables that occur in the 3-module M m . A subsequence (h\, ■ ■ ■ , /ty) of a possible 
match g is legal if for any 1 < m < j < I, h m / hj. 

n 

Lemma 4.2. Let f(x) = Yl fi{ x i^{ x i)) be an instance of N(n,2,z) and M. be a t-3- 

2=1 

module. Then the number of possible matches for the t-3-module M is 3*. Further, the 
number of legal possible matches is © ^( ^v^ )*^ 
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Proof. For each 1 < m < t, there are exactly 3 possible choices for g m : 
fh i x h j ^-{ x h))i ft2 ( x i2 5 n(rEj 2 )), fi 3 (xi 3 , H(xi s )), 

where Xi 1 , Xi 2 , and Xi 3 correspond to the three variables that occur in the 3-module M m . 
Therefore, there are 3* possible matches for the t-3-module. 

To prove the second conclusion, we divide the t-3-module into 3 parts M = (Mi, M2,M%), 
where Mi = (M m ,l < m < p), M 2 = (M m ,p + 1 < m < 3p - 1), and M 3 = 
(Ms p , Mz p+ i, Msp+2)- Letting Li, L2,and L3 be the number of legal possible matches for 
Mi, M2, M3 respectively. Since the literals in Mi are variable-distinct from the literals in 
M2, we have that the number of legal possible matches, L, for the t-3-module M satisfies 

LiL 2 <L< TIL1L2. 

We now estimate the order of L\. To this end, we consider the probability space (fi,P), 
where O, is the set of sequences (51, • • • ,g p ) of local fitness functions that possibly match Mi 
and P is the uniform probability distribution. Then, the number of legal possible matches 
is 

Li = I J) I ■ Pr{a random sample from is legal} (4.5) 

Let g = (</i, • • • ,g p ) be a random sample from f2 and x Qm denote the main variable of the 
local fitness function g m , then we have 

Pr { x 9m = \ u m\) = Pr{x 9m = \u m+ i\} = Pr{x gm = \z m \} = ^, 

where \u\ denotes the variable corresponding to the literal u. 

Let B m ,0 < rn < p be the event that the first m local fitness functions gi,- ■ ■ ,g m in 
the possible match g = (gi,--- ,g p ) are mutually distinct. Since in Mi only consecutive 
3-modules share variables, we have 

B m = {(gi, ■■■ ,g m ) : g, t ^ g i+u 1 < i < m - 1}. 

Let b m = Pr{g m / g m -i I fim-i},m > 2, and bi = 1. Notice that Bi = tt. Then, we have 

Pr{g = (01, • • • , 0p)is legal} = Pr{B p } 

= Pr{gi + 92,92 + 53, ■ ■ ■ , 9 P -i + 9v) . . 

= Pr{Bi}Pr{g 2 + gi \ Bi} ■ Pr{g 3 + g 2 \ B 2 } ■ ■ ■ Pr{g p + g p .i | B p _i} 
= bib 2 ■■■bp 

Recalling that x 9m denotes the main variable of the local fitness function g m , we have 

b p = Pr{g p _i / 5p,^3p-i = Kl I Bp-i} + Pr{g p -i + 9p, x g v -i + \u p \ | -Bp-i} 
= Pr{g P -i + 9p I B p -i,x gp _ x = \u p \} ■ Prixg^ = \u p \ \ B p _i} + 
Pr{g P -i + 9p I B v -i,x gv _ x / \u v \\ ■ Pr{x gp _ x / \u v \ \ B p _i} 

2 

= + (1 - a p) 

= 1-^,, (4-7) 
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where a p = Pr{x gp _ 1 = \u p \ \ B p -i}. For a p , we have 

_ PrjBp^Xg^ = \u p \} 
Pr{B pl } 

{PriBp-^Xg^ = \u p \,x 9p _ 2 = |«p_l|} 



Pr{B p _i} 

+ Pr{P p _i,x 9p _ 1 = \u p \,x 9p _ 2 ± |« p -i|}) 

-I l 4 -°J 

= I -B P -i,a; 9p -2 = K-i|} ■ Pr { B p-u x 9 P -2 = K-i|} 



Pr{P p _!> 

+ Pr{:c 9p _ 1 = \u p \ | B p - U x 9p _ 2 / |u„_i|} ■ Pr{P p _i, z 9p _ 2 / |«„-i|}) 
= Pr {B y (^ Pr { B P-^ x 9 P -2 = K-i\} + \Pr{B p - U x gp _ 2 + 

The last equation in the above formula is because that given P p _i and x 9p _ 2 = \u p -\\ 
(or Xg p _ 2 ^ |it p _i|), we have two (three, respectively) choices in selecting the local fitness 
function g p -\. Consider the two terms Pr{B p _\,x gp _ 2 = \u p -\\\ and Pr{B p _i,x gp _ 2 / 
l^/p-il} in (4.8), we have 

Pr{B p _ uX g p _ 2 = \u p -i\] 

= Pr{g P -2 + g P -i \B p _2,x 9p _ 2 = \up-i\} ■ Pr{B p _2,x 9p _ 2 = \u p -i\} 



2 

g Pr{a; 9p _ 2 = \u p -i\ \ B p _ 2 } ■ Pr{B p _ 2 } 
2 

-a p _i • Pr{P p _ 2 } 



(4.9) 



and 



Pr{P p _i,:z 9p _ 2 / 

= Pr{g P -2 +g v -\ | B p _2,x 9p _ 2 + \u p -i\} ■ Pr{B p _ 2 ,Xg p _ 2 + 

= Pr { a; 9p-2 / I £ P -2} • Pr{P p _ 2 } 

= (1 - a p _i) • Pr{P p _ 2 } 

By plugging (4.9) and (4.10) into (4.8), we get 

Pr{B p _ 2 } (I 1 

-a p _i + -(1 - a p _i) 



(4.10) 



This, together with (4.7), gives us 

It is not difficult to show that the sequence {b p } is decreasing and lower bounded by 0. 
Letting lim b p = b and taking the limit on both sides, we get 



v 



6=1-1 (4.12, 
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and thus, b = In our case, b = since bi = 1. It follows that b p > b = and 

thus, 



&i • • • ftp > 



'3 + V5\ P 



6 / 

Prom (4.5), we know that the number of legal possible matches is greater than 

*(^)'=(^)'- 

To prove that the expected number of legal possible matches L\ for M.\ is in © ^ 3+ 2 v ^ y 
let a p = b p - = b p -b. From (4.11) and (4.12), we have 

a p = b p -b = %^ < dotp-i, < d < 1, 

p 

which means that the series ^ a m is convergent. It follows that 

m=l 

+ + f ) 

converges to a finite positive constant c. Therefore, 

6i ■ ■ ■ 6 P = (6 + ai) ■ ■ ■ (6 + «p) 

Of] 

/.3 4- i 

< c 



'3 + V5"V 



(4.14) 



6 



for sufficient large p and some constant c. 

Similarly, we can show that the number of legal possible matches Li for M2 is in 

6 I 2 V ) . Recalling that the number of legal possible matches L for the t-3-module 



V 2 / 

satisfies -L1L2 < L < TlLxL^-, the second conclusion follows. □ 

The following Lemma calculates the probability that a matching local fitness function 
implies the matched 3-module. 

Lemma 4.3. Given a 3-module iVyVw, x V y V w, and a local fitness function g such 
that the main variable x g of g is one of the three Boolean variables \x\, \y\, \w\, let z = 
2 + a, < a < 1 be the parameter in the fixed ratio model N(n,2,z). Then the probability 
that g contains the 3-module is 

"• = (cT))(s (1 - a » + s B ) (415) 
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Proof: Since x g is already one of the variables in the 3-module, the probability that 
the other two variables are also in the 3-module is 



fn-l\ 



Now, assume that the variables of the local fitness function g are the same as the 
variables in the 3-module. Prom the definition of the fixed ratio model, g has two zeros 
in its fitness value assignment with probability (1 — a), and has three zeros in its fitness 
assignment with probability a. Note that the local fitness function g implies the 3-module 
x\/ y\/ w, x\/ y V w if and only if 

g(x, y, w) = and g{x, y, w) = 0. 

Prom the definition of the fixed ratio model, this happens with the probability 

J_, _ . _6_ 
^ I 1 a ) + ^ a 

The Lemma follows. □ 

With the above preparation, we can now prove that the average number of t-3-modules 
contained in N(n, 2, z) tends to infinity. 

Theorem 4.2. Let A t be the number of t- 3 -modules contained in N(n, 2, z) andt = ©(ln 2 n). 
Then, if z = 2 + a > 2.837, 

lim E{A t } = oo. (4.16) 

n— >oo 

Proof: Prom Lemma 4.2, there are more than ( 3+ 2 v ^ )* legal possible matches for a fixed 
t-3-module. Prom Lemma 4.3, we know that each possible legal match g = {gi,--- ,gt} 
implies the t-3-module with probability p^. Prom the proof of Theorem 10.1 in (Franco &: 
Gelder, 1998), there are 

2 t--i n k±{n - t + 1)* (4.17) 

possible t-3-modules, where rd—^- = ^ "^^ . Let r = (^g(l — a) + ^a), and write po = 
/ n I n r. We have 

( 2 ) 

E{A t } = 




(4.18) 



4(n - t + 1 
1 

An 



1(2(3 + vs w <(i-o(0 
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where the fourth equation in (4.18) is due to the fact that for any positive integer n and q 
such that q < we have n q e~ q l 2n < nX < n q . It follows that lim E{A t } = oo if 

2(3 + V5)r>l. (4.19) 

Solving the inequality (4.19) gives us a > 0.837, that is, z = 2 + a > 2.837. This proves 
Theorem 4.2. □ 

Based on the Chebychev's inequality, to prove that iV(n, 2, z) contains t-3-modules with 
probability 1, we need to show that the variance of A t , the number of contained t-3-modules, 
is o(E{A t }). For this purpose, we follow Franco and Gelder's approach (Lemma 4.1, Franco 
&: Gelder, 1998) to apply the second moment method (Alon & Spencer, 1992): 

Lemma 4.4. (Alon & Spencer, 1992, Ch. J^.3 Cor 3.5) Given a random structure (e.g., a 
random CNF formula), let W be the set of substructures under consideration, A(w) be the 
set of substructures sharing some clauses with w £ W. Let I w = 1 when w is in the random 
structure and otherwise. If 

(1) elements of W are symmetric; 

(2) V = E{Y1 /«,}-> oo; and 

wew 

(3) Yl Pr(w | w) = o(n), for each w G W, 

then as n — >■ oo, the probability that the random structure contains a substructure tends to 
1. 

To use the above Lemma to study the 2-SAT sub-problem in NK landscapes, we view 
the random structure to be a random instance of N(n,2,z), and W to be the set of all 
t-3-modules which are symmetric by their definitions (Sections 5 and 10, Franco h Gelder, 
1998). 

Theorem 4.3. If z = 2 + a > 2.837, then N(n, 2, z) is asymptotically insoluble with prob- 
ability 1. 

Proof: Let A t be the number of t-3-modules implied by N(n,2,z) and t = 0(ln 2 n). 
Theorem 4.2 shows that lim E{A t } = oo. By Lemma 4.4, it is enough to show that for 

n— ¥oo 

each w G W, 

Pr(w | w) = o{E{A t }), (4.20) 

where Pr(w \ w) is the conditional probability that N(n,2,z) implies the t-3-module w 
given that it implies w, and A(w) is the set of all t-3-modules sharing some clauses with w. 

Suppose that w shares Q, 1 < Q < 2t clauses with w, and that these Q clauses are 
distributed among q 3-modules. Further, let q\ be the number of 3-modules whose two 
clauses are both shared and q2 = Q — Qi the number of 3-modules that only has one clause 
shared. 

Let T\ be a 3-module in w that shares exactly one clause with a 3-module Ti in w. 
We claim that the conditional probability that T\ is implied by N(n, 2, z) given that w is 
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implied by N(n,2,z), is 

ia + 0(i). (4.21) 

Without loss of generality, assume that T2 = {x\/y\/u, x\/y\/u} and T\ = {x\/y\/u, x\/y\/u}. 
Since w is implied by N(n, 2, z), there is a local fitness function g = g(\x\, \y\, \u\) that implies 
T2. The conditional probability that Ti is implied, is less than or equal to Pi + P2 where 
Pi is the conditional probability that g also implies the clause x V y V u given that g implies 
T2, and P2 is the conditional probability that the clause x V y V it is implied by other local 
fitness functions. By the definition of N(n, 2, z), we have that P\ = ^a. Since a local fitness 
function implies x V y V u only if it has the same variables with g = g(\x\, \y\, \u\), we have 
that P2 = O(^). The claim is proved. It follows that, for sufficiently large n, 

Pr{w \w}<c (^y^Po j • l qi ■ Q«) ^ (4.22) 

where po is defined in Lemma 4.3 and c is a fixed constant. 

Let Aq^ A2 (w) be the set of t-3-modules that share Q clauses with w such that these Q 
clauses are distributed over q different 3-modules. As before, q\ is the number of 3-modules 
whose two clauses are both shared and q2 = Q — Qi the number of 3-modules that only has 
one clause shared. We claim that 

\A Q ^ q2 (w)\ = |A 2M ,oM|6«. (4.23) 

where ^2^,9,0 (w) is the set of t-3-modules that share all the 2q clauses in the q 3-modules 
with w. Let M = {Mi, • • • , M t } be a t-3-module in which all the clauses Mj, 1 < % < q 
are shared with w. Let M = {Mi,-- - ,M t } be a t-3-module in which all the clauses in 
Mj, 1 < i < q\ are shared and each of the 3-modules Mi,q\ + 1 < q\ + q2 has only one 
clause shared. Since for each of the q2 3-modules, we have 6 ways to choose the non-shared 
clauses, there are 6 92 such t-3-modules M in AQ^ qm (w) that correspond to one t-3-module 
M in A2q : q : o- The claims follow. Prom formula (55) and (56) in (Franco Sz Gelder, 1998) 
and (4.23), it follows that 

on) \ \, t ? (4.24) 

Oil) 2 t-q n 2(t- q )Q q2 q>p+ l m V > 

71 

Let r = (^g(l — ex) + ^«), and write po = ( „-i\ r. Then, we have 

\A Q ^ q2 (w)\Pr{w I w} 

< ^) 2 *-% 2 (*-^6 ( ' 2 (^^po) < -' ? (-«)' ?2 
ra^ 2 6 



("i 1 ) 



< 



O(i) 



n 4n 



-(2(3 + ^5)0)*"" 



< 2W S {A t }(2(3 + V5)r))-«, < + 3 
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and 

\A Q ^{w)\Pr{w\w] <0{\)E{A t }{2{Z + y/Z)r))-*, q>p + 3. (4.26) 

Therefore, 

^2 Pr(w\w)= ^ \ A Q,q,q2( w )\ Pr {™ I w ) 

= E E E^^}( 2 ( 3 + ^)) _<? + E E E°( 1 )^}( 2 ( 3 + ^))" <? - 

Q=lq<p+3 q2 Q=l q>p+3 g 2 

(4.27) 

Since 2(3 + y/E)r) > 1 for z > 2.837, we have 

Pr{w I w) < ( ^-E{A t } + t 3 E{A t }(4r)-t p+ V 
weA( w ) "' (4.28) 

= o(E{A t }). 

This completes the proof of Theorem 4.3. □ 
5. Experiments 

Our study of the threshold phenomena in NK landscapes started with an experimental 
investigation. Many of the theoretical results in the previous section are motivated by 
the observations made in our experiments. In this section, we describe the approach and 
methods we used in the experimental study, and report the results and observations we 
have made. 

In our experiments, an instance of the NK landscape decision problem is converted to an 
equivalent 3-SAT problem, and then the 3-SAT problem is solved using Roberto's relsat — an 
enhanced version of the famous Davis-Putnam algorithm for SAT problems implemented in 
C ++ . The source code of relsat can be found at http://www.cs.ubc.ca/ hoos/SATLIB/. 

In the experiments, we generated random instances of the NK landscape decision prob- 
lem from the random model N(n,2,z). As a result, the equivalent SAT problem for each 
random NK landscape instance is a 3-SAT problem with n variables and (on average) zn 
clauses. By definition, the parameter z is between and 8. For z < 1, the 3-SAT instance 
can be solved easily by setting the literals that correspond to the main variables of the 
local fitness function to true. As z increases, we get more and more clauses and the 3-SAT 
problem becomes more and more constrained. The aims of the experiments are three- 
fold:(l)Investigating if there exists a threshold phenomenon in the random NK landscape 
model; (2) Locating the threshold of the parameter z; and (3)Determining if there are any 
hard instances around the threshold. 

5.1 Experiments on the Fixed Ratio Model 

In this part of the experiments, we generate 100 random instances of N(n, 2, z) for each 
of the parameters n = 2 9 • • • 2 16 and z = 2.71, 2.72, •• • ,3.00. These instances are then 
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Figure 1: Fractions of insoluble instances(Y-axis) as a function of z (X-axis). 



converted to 3-SAT instances and solved by relsat. Figure 1 shows the fraction of insoluble 
instances as a function of the parameter z. It can be seen that there exists a threshold 
phenomenon and the threshold is around 2.83. This shows that our upper bound z = 2.837 
is very tight. 

In Figure 2, we plot the square root of the average search cost as a function of the 
parameter n. The figure indicates that the average search is in 0(n 2 ) for any parameter 
z. We have also observed that more than 99 percent of the insoluble instances are solved 
quickly in the preprocessing stage of relsat. This indicates that there must be some "small" 
structures that make the instances insoluble. More detailed experimental results can be 
found in Gao's thesis (Gao, 2001). 

5.2 Experiments on the 2-SAT sub-Problem 

This is the part of the experiments that motivated our theoretical analyses in Section 4.2. 
The idea can be explained as follows. Let 

n 

f{x) = Y,fi{x»n{x i )) 

2 = 1 

be an instance of the decision problem of NK landscape and 

<p = C 1 f\C 2 ---f\C n 

the equivalent 3-SAT problem where d is the set of 3-clauses equivalent to the local fitness 
function /j. For each i, there is a set of 2-clauses Di (possibly empty) implied by Cj. For 
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35 




x 10 4 



Figure 2: Square root of the average search cost (Y-axis, in seconds) as a function of n 
(X-axis). 




Figure 3: Fractions of insoluble instances(Y-axis) as a function of z (X-axis) for 2-SAT 
sub-problems. 



example, if Cj has three 3-clauses ((x,y,z), (x,y,z), (x,y,z)), then the set of 2-clauses Di 
would be ((x,z), (x,y)). The conjunction of Di, denoted by tp, is a 2-SAT problem. It is 
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-^<— z-2.71 
z-2.80 




x 10 4 



Figure 4: Square root of the average search cost (Y-axis, in seconds) as a function of n 
(X-axis) for 2-SAT sub-problems. 



obvious that the original 3-SAT problem <p is satisfiable only if the 2-SAT sub-problem (p is 
satisfiable. In the experiment, we generate instances of the NK landscape N(n, 2, z), convert 
them to the equivalent 3-SAT problems, and extract the 2-SAT sub-problems. These 2-SAT 
problems are then solved by the relsat solver. If the 2-SAT problem is unsatisfiable, then 
the original NK landscape instance is also insoluble. 

The experimental settings are the same as those in the experiment on the original 
problem. The results are shown in Figures 3-4, in parallel to the Figures 1-2 of the results 
on the original 3-SAT problems in Section 5.1. We see that the patterns of insoluble fractions 
and search cost are similar to those we found in the original 3-SAT problems. There is a 
soluble-insoluble phase transition occurring around 2.83, but the fraction of unsatisfiable 
instances is lower than the fraction in the original 3-SAT problems. 

We also observed that the average search cost for the 2-SAT sub-problems remains the 
same as that for the original 3-SAT problems. This tells us that the difficulty of solving a 
soluble instance of NK landscape is almost the same as that of solving a 2-SAT problem, and 
hence is easy. Therefore, on average the NK landscape N(n, 2, z) is also easy at parameters 
below the threshold where almost all of the instances are soluble. 

6. Implications and Conclusions 

One of the questions that arises about this work is its implications to the design and anal- 
ysis of genetic algorithms. NK landscapes were initially conceived as simplified models 
of evolutionary landscapes which could be tuned with respect to ruggedness and epistatic 
interactions (Kauffman, 1989). In the study of genetic algorithms, NK landscape models 



329 



Gao & Culberson 



have been used as a prototype and benchmark in the analysis of the performance of dif- 
ferent genetic operators and the effects of different encoding methods on the algorithm's 
performance (Altenberg, 1997; Hordijk, 1997; Jones, 1995). Kauffman (1993) points out 
that the parameters that primarily affect a number of ruggedness measures are n and k. 

Nevertheless, the fact that for k > 2 the discrete NK landscape is NP-complete (Wright 
et al., 2000) when the neighbors are arbitrarily chosen could be construed as implying that 
random landscapes with fixed k are in practice hard. 

The results in this paper should serve as a cautionary note that this may not be the 
case. Our analyses show that for fixed k the uniform probability model is trivially solvable 
as the problem size tends to infinity. For the fixed ratio model, we have derived two upper 
bounds for the threshold of the solubility phase transition, and proved that the problem 
with the control parameter above the upper bounds can be solved in polynomial time with 
probability asymptotic to 1 due to the existence of easy sub-problems such as 2-SAT. A 
series of experiments has also been conducted to investigate the hardness of the problem 
with the control parameters around and below the threshold. From the experiments, we 
have observed that the problem is also easy around and below the threshold. 

Our proofs hold only for the decision version of the problem where the component func- 
tions are discrete on {0, 1}. The proofs are obtained by noticing that the clustering of 
functions, or clauses, on selected subsets of variables implies that the overall problem is 
decomposable into independent subproblems, or that the problem contains small substruc- 
tures that identify the solution. The subproblems are the components of the connection 
graph defined in Section 3 and the 2-SAT sub-problems studied in Section 4.2. It is cur- 
rently unclear to us to what extent our analysis can be extended to the optimization version 
of the NK model, and we would like to study this problem further in the future. 

In response to the question 'what are the implications for GAs?' we suggest the following 
speculative line of enquiry. For the discrete model we use, the soluble instances are readily 
solved by a standard algorithmic approach based on recognizing the components of the 
connection graph. (This should not be a surprise for us as it has been pointed out by 
Heckendorn, Rana, and Whitley (1999) that 'Even relatively old algorithms such as Davis- 
Putnam which are deterministic and exact are orders of magnitude faster than GAs'.) 1 A 
similar connectivity can be developed for real valued distributions, for example by capping 
the minimum value which we allow a sub-function to take. We can speculate that the 
clustering imposed by fixed values of k would also generate localized structures when real 
values are applied and when considering optimization instead of decision, but perhaps with 
fuzzy boundaries. In fact, this observation is just the flip side of limited epistasis. Genetic 
algorithms, or their variants such as the probabilistic model-building algorithms (Larranaga 
& Lozano, 2001), designed to mimic natural evolution, are supposed to take advantage of 
this situation. So, to the extent that NK landscapes are an accurate reflection of the 
features exploited by evolutionary algorithms, we pose the following question. Is it possible 
to identify these fuzzy components if they exist, and in doing so design an algorithm that 
exploits the same landscape features that the evolutionary algorithms do, but far more 
efficiently, as we have done for the uniform discrete decision problem? 

1. We thanks an anonymous referee for pointing out to us the work of Heckendorn, et al. (Heckendorn 
et al., 1999) 
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These landscapes were designed with the intent of studying limited interactions, and our 
results can also be seen as a confirmation that indeed limited epistasis leads to easier prob- 
lems. In another domain, that of the more traditional research into search and optimization, 
there is a need for test bed problems with real world connections which are tunable with 
respect to difficulty. NK landscapes might have been such a domain for generating 3-SAT 
instances. It is disappointing that for restricted k the instances generated are easy with 
high probability. 
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